Clustering US Senators

Posted on Dim 23 septembre 2018 in Data Analysis

Clustering US Senators

The goal is to cluster Senators to find those how follow the mainstream of their parties, and those who doesn't.

The dataset represent the Senate votes on proposed legislation from the 114th Senate. Each row represent a Senator and each column represent a vote. A 0 in a cell means the Senator voted No on the bill, 1 means the Senator voted Yes, and 0.5 means the Senator abstained.

In [1]:
import pandas as pd
votes = pd.read_csv("114_congress.csv")
votes.head()
Out[1]:
name party state 00001 00004 00005 00006 00007 00008 00009 00010 00020 00026 00032 00038 00039 00044 00047
0 Alexander R TN 0.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0
1 Ayotte R NH 0.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
2 Baldwin D WI 1.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0
3 Barrasso R WY 0.0 1.0 1.0 1.0 1.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0
4 Bennet D CO 0.0 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0

Number of Senators by Party, Mean by roll

In [2]:
print(votes['party'].value_counts())
print(votes.mean())
R    54
D    44
I     2
Name: party, dtype: int64
00001    0.325
00004    0.575
00005    0.535
00006    0.945
00007    0.545
00008    0.415
00009    0.545
00010    0.985
00020    0.525
00026    0.545
00032    0.410
00038    0.480
00039    0.510
00044    0.460
00047    0.370
dtype: float64

Clustering

In [4]:
import pandas as pd
from sklearn.cluster import KMeans

kmeans_model = KMeans(n_clusters=2, random_state=1)
senator_distances = kmeans_model.fit_transform(votes.iloc[:,3:])
#distance from each cluster

Count how many Senators from each party ended up in each cluster.

In [5]:
labels = kmeans_model.labels_
print(labels)
pd.crosstab(labels , votes["party"]) 
[1 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 0 1 1 0 1
 0 1 1 1 0 1 1 0 1 1 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0
 1 1 1 1 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 0 1 0 0 0 1 0]
Out[5]:
party D I R
row_0
0 41 2 0
1 3 0 54

The first cluster contains 41 Democrats, and both Independents. The second cluster contains 3 Democrats, and 54 Republicans.

It sounds like 3 Democrats are more similar to Republicans in their voting than their own party. We'll explore these 3 Senators in more depth.

In [6]:
democratic_outliers = votes[(labels == 1) & (votes["party"] == "D")]
print(democratic_outliers)
        name party state  00001  00004  00005  00006  00007  00008  00009  \
42  Heitkamp     D    ND    0.0    1.0    0.0    1.0    0.0    0.0    1.0   
56   Manchin     D    WV    0.0    1.0    0.0    1.0    0.0    0.0    1.0   
74      Reid     D    NV    0.5    0.5    0.5    0.5    0.5    0.5    0.5   

    00010  00020  00026  00032  00038  00039  00044  00047  
42    1.0    0.0    0.0    0.0    1.0    0.0    0.0    0.0  
56    1.0    1.0    0.0    0.0    1.0    1.0    0.0    0.0  
74    0.5    0.5    0.5    0.5    0.5    0.5    0.5    0.5  
In [7]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.scatter(senator_distances[:,0], senator_distances[:,1], c = labels)
plt.show()

Distance from the cluster center

Let's find extrems Senators

We cube every distance between a Senator point from the cluster center for identifying extrems

In [8]:
extremism = (senator_distances ** 3).sum(axis=1)
votes['extremism'] = extremism
votes.sort_values("extremism", inplace=True, ascending=False)
print(votes.head(10))
    
         name party state  00001  00004  00005  00006  00007  00008  00009  \
98     Wicker     R    MS    0.0    1.0    1.0    1.0    1.0    0.0    1.0   
53   Lankford     R    OK    0.0    1.0    1.0    0.0    1.0    0.0    1.0   
69       Paul     R    KY    0.0    1.0    1.0    0.0    1.0    0.0    1.0   
80      Sasse     R    NE    0.0    1.0    1.0    0.0    1.0    0.0    1.0   
26       Cruz     R    TX    0.0    1.0    1.0    0.0    1.0    0.0    1.0   
48    Johnson     R    WI    0.0    1.0    1.0    1.0    1.0    0.0    1.0   
47    Isakson     R    GA    0.0    1.0    1.0    1.0    1.0    0.0    1.0   
65  Murkowski     R    AK    0.0    1.0    1.0    1.0    1.0    0.0    1.0   
64      Moran     R    KS    0.0    1.0    1.0    1.0    1.0    0.0    1.0   
30       Enzi     R    WY    0.0    1.0    1.0    1.0    1.0    0.0    1.0   

    00010  00020  00026  00032  00038  00039  00044  00047  extremism  
98    0.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  46.250476  
53    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  46.046873  
69    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  46.046873  
80    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  46.046873  
26    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  46.046873  
48    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  40.017540  
47    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  40.017540  
65    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  40.017540  
64    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  40.017540  
30    1.0    1.0    1.0    0.0    0.0    1.0    0.0    0.0  40.017540  
In [ ]: